Adaptive Randomized Dimension Reduction on Massive Data
نویسندگان
چکیده
The scalability of statistical estimators is of increasing importance in modern applications. One approach to implementing scalable algorithms is to compress data into a low dimensional latent space using dimension reduction methods. In this paper we develop an approach for dimension reduction that exploits the assumption of low rank structure in high dimensional data to gain both computational and statistical advantages. We adapt recent randomized low-rank approximation algorithms to provide an efficient solution to principal component analysis (PCA), and we use this efficient solver to improve parameter estimation in large-scale linear mixed models (LMM) for association mapping in statistical and quantitative genomics. A key observation in this paper is that randomization serves a dual role, improving both computational and statistical performance by implicitly regularizing the covariance matrix estimate of the random effect in a LMM. These statistical and computational advantages are highlighted in our experiments on simulated data and large-scale genomic studies.
منابع مشابه
Improved Analysis of the subsampled Randomized Hadamard Transform
This paper presents an improved analysis of a structured dimension-reduction map called the subsampled randomized Hadamard transform. This argument demonstrates that the map preserves the Euclidean geometry of an entire subspace of vectors. The new proof is much simpler than previous approaches, and it offers—for the first time—optimal constants in the estimate on the number of dimensions requi...
متن کاملData-Adaptive Reduced-Dimension Robust Beamforming Algorithms
We present low complexity, quickly converging robust adaptive beamformers that combine robust Capon beamformer (RCB) methods and data-adaptive Krylov subspace dimensionality reduction techniques. We extend a recently proposed reduced-dimension RCB framework, which ensures proper combination of RCBs with any form of dimensionality reduction that can be expressed using a full-rank dimension reduc...
متن کاملPersistent homology for low-complexity models
We show that recent results on randomized dimension reduction schemes that exploit structural properties of data can be applied in the context of persistent homology. In the spirit of compressed sensing, the dimension reduction is determined by the Gaussian width of a structure associated to the data set, rather than its size, and such a reduction can be computed efficiently. We further relate ...
متن کاملUniversality laws for randomized dimension reduction, with applications
Dimension reduction is the process of embedding high-dimensional data into a lower dimensional space to facilitate its analysis. In the Euclidean setting, one fundamental technique for dimension reduction is to apply a random linear map to the data. This dimension reduction procedure succeeds when it preserves certain geometric features of the set. The question is how large the embedding dimens...
متن کاملRecognition and Analysis of Massive Open Online Courses (MOOCs) Aesthetics for the Sustainable Education
The present study was conducted to recognize and analyze the Massive Open Online Course (MOOC) aesthetics for sustainable education. For this purpose, two methods of the exploratory search (qualitative) and the questionnaire (quantitative) were used for data collection. The research sample in the qualitative section included the electronic resources related to the topic and in the quantitative ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of Machine Learning Research
دوره 18 شماره
صفحات -
تاریخ انتشار 2017